Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naive approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows for the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and behaviour cloning, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.
translated by 谷歌翻译
快速,可靠地找到准确的逆运动学(IK)解决方案仍然是机器人操纵的挑战性问题。现有的数值求解器广泛适用,但依赖于本地搜索技术来管理高度非关键目标函数。最近,基于学习的方法已显示出有望作为生成快速准确的IK结果的一种手段。可以轻松地将学习的求解器与端到端系统中的其他学习算法集成在一起。但是,基于学习的方法具有致命的脚跟:每个感兴趣的机器人都需要一个专门的模型,必须从头开始训练。为了解决这一关键缺点,我们研究了一种新颖的距离几何机器人表示,并与图形结构相结合,使我们能够利用图形神经网络(GNNS)的灵活性。我们使用这种方法来训练第一个学到的生成图形逆运动学(GGIK)求解器,它至关重要的是,“机器人 - 敏捷” - 单个模型能够为各种不同的机器人提供IK解决方案。此外,GGIK的生成性质使求解器可以同时生产大量不同的解决方案,并与最小的额外计算时间同行,使其适用于诸如基于采样的运动计划之类的应用。最后,GGIK可以通过提供可靠的初始化来补充本地IK求解器。这些优势以及使用与任务相关的先验并通过新数据不断改进的能力表明,GGIK有可能成为灵活的,基于学习的机器人操作系统的关键组成部分。
translated by 谷歌翻译
获取3D对象表示对于创建照片现实的模拟器和为AR/VR应用程序收集资产很重要。神经领域已经显示出其在学习2D图像的场景的连续体积表示方面的有效性,但是从这些模型中获取对象表示,并以较弱的监督仍然是一个开放的挑战。在本文中,我们介绍了Laterf,一种从给定的2D图像和已知相机姿势的2D图像中提取感兴趣对象的方法,对象的自然语言描述以及少数对象和非对象标签 - 输入图像中的对象点。为了忠实地从场景中提取对象,后来在每个3D点上都以其他“对象”概率扩展NERF公式。此外,我们利用预先训练的剪辑模型与我们可区分的对象渲染器相结合的丰富潜在空间来注入对象的封闭部分。我们在合成数据集和真实数据集上展示了高保真对象提取,并通过广泛的消融研究证明我们的设计选择是合理的。
translated by 谷歌翻译
通过触觉反馈感知物体滑移的能力使人类能够完成复杂的操纵任务,包括保持稳定的掌握。尽管触觉信息用于许多应用程序,但触觉传感器尚未在工业机器人设置中广泛部署。挑战的一部分在于从触觉数据流中识别滑移和其他事件。在本文中,我们提出了一种基于学习的方法,可以使用气压触觉传感器检测滑移。这些传感器具有许多理想的属性,包括高耐用性和可靠性,并且由廉价的现成组件构建。我们训练一个时间卷积神经网络来检测滑动,达到高检测精度,同时表现出稳健性,以对滑动运动的速度和方向。此外,我们在涉及各种常见对象的两项操纵任务上测试了探测器,并证明了对训练期间看不到的现实情况的成功概括。我们认为,气压触觉传感技术与数据驱动的学习相结合,适用于许多操纵任务,例如滑移补偿。
translated by 谷歌翻译
有效的探索仍然是一个重要的挑战,这可以防止为许多物理系统部署加强学习。对于具有连续和高维状态和动作空间的系统尤其如此,例如机器人操纵器。挑战在稀疏奖励环境中强调,其中设计密集奖励设计所需的低级状态信息不可用。对手仿制学习(AIL)可以通过利用专家生成的最佳行为和基本上提供替代奖励信息的替代来部分克服这一屏障。不幸的是,专家示范的可用性并不一定能够改善代理商有效探索的能力,并且正如我们经常展现所在,可以导致效率低或停滞不前。我们从引导播放(LFGP)中展示了一个框架,其中我们利用了专家演示,除了主要任务,多个辅助任务。随后,使用修改的AIL过程来使用分层模型来学习每个任务奖励和策略,其中通过组合不同任务的调度程序强制对所有任务的探索。这提供了许多好处:具有挑战瓶颈转换的主要任务的学习效率得到改善,专家数据在任务之间可重复使用,并且通过重用学习辅助任务模型的传输学习成为可能。我们在一个具有挑战性的多任务机器人操纵域中的实验结果表明我们的方法有利地对监督模仿学习和最先进的AIL方法进行比较。代码可在https://github.com/utiasstars/lfgp获得。
translated by 谷歌翻译
理想情况下,机器人应该以最大化关于其内部系统和外部操作环境的状态所获得的知识的方式移动。轨迹设计是一个具有挑战性的问题,从各种角度来看,从信息理论分析到基于倾斜的方法。最近,已经提出了基于可观察性的指标来找到能够快速准确的状态和参数估计的轨迹。这些方法的活力和功效尚未在文献中众所周知。在本文中,我们比较了两个最先进的方法,以便可观察性感知轨迹优化,并寻求增加重要的理论澄清和对其整体效力的宝贵讨论。为了评估,我们使用逼真的物理模拟器检查传感器到传感器外部自校准的代表性任务。我们还研究了这些算法的灵敏度,以改变易欣欣传感器测量的信息内容。
translated by 谷歌翻译
反向运动学(IK)是找到满足一个或多个末端效应器的位置或姿势的限制的机器人联合配置的问题。对于具有冗余自由度的机器人,通常存在无限,不透露的解决方案。当通过工作空间中的障碍施加碰撞限制时,IK问题进一步复杂。通常,不存在产生可行配置的闭合表达,促使使用数值解决方案方法。然而,这些方法依赖于局部优化非凸起问题,通常需要准确的初始化或许多重新初始化来收敛到有效的解决方案。在这项工作中,我们首先将复杂的工作空间约束制定逆运动学,作为凸的可行性问题,其低级可行点提供精确的IK解决方案。然后,我们呈现\ texttt {cidgik}(距离 - 几何反向运动学的凸迭代),这是一种解决这种可行性问题的算法,其具有旨在鼓励低秩最小化的半导体级程序的序列。我们的问题制定优雅地统一机器人的配置空间和工作空间约束:内在机器人几何形状和避免避免都表示为简单的线性矩阵方程和不等式。我们对各种流行的操纵器模型的实验结果比传统的非线性优化的方法更快,更准确的会聚,特别是在具有许多障碍的环境中。
translated by 谷歌翻译
解决逆运动学问题是针对清晰机器人的运动计划,控制和校准的基本挑战。这些机器人的运动学模型通常通过关节角度进行参数化,从而在机器人构型和最终效果姿势之间产生复杂的映射。或者,可以使用机器人附加点之间的不变距离来表示运动学模型和任务约束。在本文中,我们将基于距离的逆运动学的等效性和大量铰接式机器人和任务约束的距离几何问题进行形式化。与以前的方法不同,我们使用距离几何形状和低级别矩阵完成之间的连接来通过局部优化完成部分欧几里得距离矩阵来找到逆运动学解决方案。此外,我们用固定级革兰氏矩阵的Riemannian歧管来参数欧几里得距离矩阵的空间,从而使我们能够利用各种成熟的Riemannian优化方法。最后,我们表明,绑定的平滑性可用于生成知情的初始化,而无需大量的计算开销,从而改善收敛性。我们证明,我们的逆运动求解器比传统技术获得更高的成功率,并且在涉及许多工作区约束的问题上大大优于它们。
translated by 谷歌翻译